Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Web spam detection based on immune clonal feature selection and under-sampling ensemble

LU Xiaoyong, CHEN Musheng, WU Jhenglong, CHANG Peichan

Journal of Computer Applications 2016, 36 (7): 1899-1903. DOI: 10.11772/j.issn.1001-9081.2016.07.1899

Abstract （541）

PDF （808KB）（282）

Save

To solve the problem of "curse of dimensionality" and imbalance classification, a binary classifier algorithm based on immune clonal feature selection and Under-Sampling (US) ensemble was proposed to detect Web spam. Firstly, major samples in training dataset were sampled into several sample subsets, which were combined with minor samples to generate several balanced training sample subsets. Then an immune clonal algorithm was proposed to select several optimal feature subsets. The balanced training subsets were projected to multiple views based on the optimal feature subsets. Finally, several Random Forest (RF) classifiers were trained by these views of the training sample subsets to classify the testing samples. The testing samples' classifications were determined by voting. The experimental results on the WEBSPAM UK-2006 dataset show that the ensemble classifier algorithm outperforms these algorithms like RF, Bagging with RF and AdaBoost with RF, and its accuracy, F1-Measure, AUC (Area Under ROC Curve) are increased by more than 11% respectively. Compared with several state-of-the-art baseline classification models, the F1-Measure is increased by 2% and the AUC reaches the optimum result using the ensemble classifier.

Reference | Related Articles | Metrics